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Abstract—Conventional distributed arithmetic (DA) is popular in field programmable gate array (FPGA) 
design, and it features on-chip ROM to achieve high speed and regularity. In this paper, we describe high speed 
area efficient 1-D discrete wavelet transform (DWT) using 9/7 filter based new efficient distributed arithmetic 
(NEDA) Technique. Being area efficient architecture free of ROM, multiplication, and subtraction, NED A can 
also expose the redundancy existing in the adder array consisting of entries of 0 and 1. This architecture 
supports any size of image pixel value and any level of decomposition. The parallel structure has 100% 
hardware utilization efficiency. 
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I. Introduction 


Discrete wavelet transform (DWT) is a mathematical 
technique that provides a new method for signal 
processing and decomposes a discrete signal in the 
time domain by using 

dilated / contracted and translated versions of a single 
basis function, named as prototype wavelet [Mallat 
(1989a) ; Mallat (1989b); Daubachies (1992) ; Meyer 
(1993) ; Vetterli and Kovacevic (1995)]. DWT offers 
wide variety of useful features over other unitary 
transforms like discrete Fourier transforms (DFT), 
discrete cosine transform (DCT) and discrete sine 
transform (DST). Some of these features are; adaptive 
time-frequency windows, lower aliasing distortion for 
signal processing applications, efficient 
computational complexity and inherent scalability 
[Grzesczak et al. (1996)]. Due to these features one 
dimensional (1-D) DWT and two dimensional (2-D) 
DWT are applied in various application such as 
numerical analysis [Beylkin et al. (1992)], signal 
analysis [Akanshu and Haddad (1992)], image coding 
[Sodagar et al. (1999); Taubman (2000)], pattern 
recognition [Kronland et al. (1987)], statistics 
[Stoksik et al. (1994)] and biomedicine [Senhadji et 
al. (1994)]. Several algorithms and computation 
schemes have been suggested during last three 
decades for efficient hardware implementation of 1 -D 
DWT and 2-D DWT. 

The DWT is computationally intensive and most of 
its application demand real-time processing. One way 
of achieving high speed performance is to use fast 
computational 


Algorithm in a general purpose computers. Another 
way is to exploit the parallelism inherent in the 
computation for concurrent processing by a set of 
parallel processor. But, it is not cost effective to use a 
general purpose computer for a specific application. 
Also, general purpose computer used for their 
implementation required more space, large power and 
more computation time. With the development of 
very large scale integration (VLSI) technology it 
facilitates to digital signal processing (DSP) system 
designer to design a high performance, low cost and 
low power system in a single chip. The characteristic 
of VLSI system are that they offer greater potential 
for large amount of concurrency and offer an 
enormous amount of computing power within a small 
area [Weste and Eshraghian (1993)]. The 
computation is very cheap as the hardware is not an 
obstacle for VLSI system. But, the non-localized 
global communication is not only expensive but 
demands high power dissipation. Thus, a high degree 
of parallelism and a nearest neighbor communication 
are crucial for realization of high performance VLSI 
system [Kung (1982)]. Keeping this in view, high 
performance application specific VLSI systems are 
rapidly evolving in recent years. The special purpose 
VLSI systems maximize processing concurrency by 
parallel / pipeline processing and provides cost 
effective alternative for real- time application. 
Therefore, 2-D DWT is currently implemented in a 
VLSI system to meet the temporal requirement of 
real-time application. Keeping this fact in view, 
several design schemes have been suggested in the 
last two decades for efficient implementation of 2-D 
DWT in a VLSI system. Researchers have adopted 
different algorithm formulation, mapping scheme, 


www.ijoscience.com 


1 





INTERNATIONAL JOURNAL OL SCIENCE 

ISSN: 2455-0108 | WWW.IJOSCIENCE.COM 
VOLUME II ISSUE I FEBRUARY 2016 


and architectural design methods to reduce the 
computational time, arithmetic complexity or 
memory complexity of 2-D DWT structures. 
However, the area-delay performance of the existing 
structures changes marginally. This is mainly due to 
the memory complexity, which forms a major 
hardware component of folded 2-D DWT structure. A 
detail study of the existing design methods and a 
complexity analysis is made in Chapter 2 to find an 
appropriate design strategy to improve the area-delay 
performance of 2-D DWT structures. 

II. MULTILEVEL DISCRETE WAVELET 

Transform 

Multiresolution analysis (MRA) is a characteristic 
feature of SB and it is used for better spectral 
representation of the signal. In MRA, the signal is 
decomposed for more than one DWT level known as 
multilevel DWT. It means the low-pass output of first 
DWT level is further decomposed in a similar manner 
in order to get the second level of DWT 
decomposition and the process is repeated for higher 
DWT levels. Few algorithms have been suggested for 
computation of multilevel DWT. One of the most 
important algorithm are pyramid algorithm (PA), this 
algorithm are proposed Mallet (1989a) for parallel 
computation of multilevel DWT. PA for 1-D DWT is 
given by 

Y l J (n) = J^h(i)Y l H (2n-i) 

i=0 

( 1 ) 

y^(n) = Y j g(l)Y i r'(2n-i) 

i =0 

( 2 ) 

Where Y/ (ri) is the n-th low-pass sub band 

component of the j-th DWT level and Y h J ( n ) is the 

n-th high-pass sub band component of the j-th DWT 
level. Two-dimensional signal, such as images, are 
analyzed using the 2-D DWT. Currently 2-D DWT is 
applied in many image processing applications such 
as image compression and reconstruction [Lewis and 
Knowles (1992)], pattern recognition [Kronland et al. 
(1987)], biomedicine [Senhadji et al. (1994)] and 
computer graphics [Meyer (1993)]. The 2-D DWT is 
a mathematical technique that decomposes an input 
image in the multiresolution frequency space. The 2- 
D DWT decomposes an input image into four sub 
bands known as low-low (LL), low-high (LH), high- 
low (HL) and high-high (HH) sub band. 
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Figure 1: Figure 1: Three Level Diagram of 2-D Sub¬ 
band Wavelet Transform 

III. PROPOSED ARCHITECTURE 

The block diagram of 9/7 wavelet coefficient based 
multilevel discrete wavelet transform using NEDA 
structure shown in figure 2. In this figure, input 
sample passing through 8-bit register after that all 
symmetrical delay input is add in the equation 3 to 
equation 7. 

r(l)=X(n)+X(n-6) 

r(2)=X(n-l)+X(n-5) 

r(3) = X(n - 2) + X(n - Y) 
r(4)=X(n-3) 

We have used NEDA in 9/7 filter to remove 
multipliers. We have to apply NEDA two times get 

the 1-D 9/7 filter high pass output Y m and low pass 
output Y Li . 
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NEDA STRUCTURE 



Where h 0 , h x , h 2 , h 3 , h 4 are the Low pass filter 

coefficients and <? 0 , g\ , g 2 > <? 3 are the High pass 
filter coefficients. 

If we take the high pass coefficients g 0 , g l , g 2 and 
g 3 applied NEDA technique by v x , r 2 , r 3 and r 4 
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then we get the high pass output Y H of the 9/7 filter 
and we take the low pass coefficient h 0 , h { , h 2 , h 3 , 
and h 4 applied NEDA technique by m l , m 2 , m 3 , 

m 4 and m 5 then we get the low pass output Y L of 
the 9/7 filter. Example the low pass output step by 
step as shown in below: 
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Let /;?, = 1 , m 2 = 2 , m 3 = 3 , m 4 = 4 and 
m 5 — 5 . Then multiplier row and column and find 
out the low pass output 122. Where h 0 , h { , h 2 , h 3 , 

and h 4 daubechies 9/7 filter coefficients are 

0.6029490, 0.2668444, -0.782232, -0.0168641 and 
0.02674875 respectively. All the daubechies 9/7 filter 
coefficients multiplied by 128 and get the 77, 34, -10, 
-2 and 3 respectively. 


Y h =[ 77 34 -10 


3] 


1 

2 

3 

4 


=122 


We take the low pass coefficients fl 0 , h { , h 2 , h 3 , and 

/z 4 applied NEDA technique by , m 2 , m 3 , m 4 and 

m 5 then we get the low pass output Y L of the 9/7 
filter. 
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Figure 2: Proposed Architecture 1-D for Low Pass Filter 

Using NEDA Technique 


IN Figure 3.9, apply NEDA techniques step-1 all the 
input converts’ binary number 

m l = 001 , m 2 = 010 , m 3 = 011 , m 4 — 100 , 


m 5 = 101 

Step-2 all the binary input applied to sign 
extension so, 

j(l) = 0001 , s( 2) = 0010 , 5(3) = 0011 , 
a(4)-0100, 5(5)-0101 

Step-3 all the sign extension input applied to adder 
array so, 

m( 1) -0110, m(2) -1110, m(3) -1000, 


Now we can make the DA matrix by the filter 
coefficients as low pass filter based DA matrix 

“ 10001 “ 

0 1111 

10 110 

[ ] = 1 0 0 1 0 

L KA 0 0 110 

0 1110 

10 110 

0 0 110 


m(4) - 0101, m(5) -0111, 

m(6) —1001, mil) -1000 

m(8) - not(m 3 + m 4 ) +1 -1001 

Step-4 the entire adder array input applied to MUX 
so, 

The entire adder array input m( 1) right shift 1-bit so 

MUX (1) = O’Ol 10 =Y p (0) 
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MUX (1) add MUX (2) = Y P (1) 

= O’OllO 
= 1110 
+ 100010 

Output of the Y P (1) again right shift 1-bit and adds 
MUX (3) so 

= O’lOOOlO 
= 1000 
+ 1 100010 

Continuous the process one by one, after then 

calculate the final output 

Y P (7) = 00001111010 = 122 

Carry is rejected. 

For 2-D sub-band WT, the outputs of 1-D high pass 
and low pass filters Y Hl and Y n are passed through 

series of shift registers and then we take the samples 
parallel using parallel data access method. The 
parallel data access method is used to minimize the 
memory requirement in 2-D sub-band WT. 

i 

IV. Simulation Result 

All the designing and experiment regarding algorithm 
that we have mentioned in this paper is being 
developed on Xilinx 14. li updated version. Xilinx 
14. li has couple of the striking features such as low 
memory requirement, fast debugging, and low cost. 
The latest release of ISE™ (Integrated Software 
Environment) design tool provides the low memory 
requirement approximate 27 percentage low. ISE 
14.li that provides advanced tools like smart compile 
technology with better usage of their computing 
hardware provides faster timing closure and higher 
quality of results for a better time to designing 
solution. By the aid of that software we debug the 
program easily. Also included is the newest release of 
the chip scope Pro Serial IO Tool kit, providing 
simplified debugging of high-speed serial IO designs 
for Virtex-7 FX and Virtex-6 LXT and SXT FPGAs. 
With the help of this tool we can develop in the area 
of communication as well as in the area of signal 
processing and VLSI low power designing. 

We functionally 2-D sub-band WT verified presented 
in this paper including all low pass filter and high 
pass filter. We have been found from the results 
shown in table 1, that number of slices, number of 
slices LUTs and maximum combinational path delay 
used in different types of device family. RTL (resister 
transistor logic) view is 2-D sub-band tree structure in 
shown in figure 3. 

Table 1: Comparisons Result for 2-D Sub-band WT 
Different types of Device Family 


Device 

Family 

2-D Sub-band Wavelet Transform 

Number 

of Slice 

Number 

of Slice 

LUTs 

Maximum 

Combinational 
Path Delay 

Vertex 7 

233 

975 

17.411 

Vertex 6 

232 

971 

18.612 

Spartan 

6 

236 

975 

42.527 

Spartan 

3 

697 

224 

51.837 



Figure 3: RTL View of 2-D Sub-band Wavelet 

Transform 


2-D sub-band wavelet transform standardize two 
basic blocks for representing the image compression 
namely, low pass filter and high pass filter. Wavelet 
transforms a vast application in many areas like 
image compression, signal processing and VLSI 
design. We propose a 2-D sub-band novel distributed 
arithmetic paradigm named NEDA structure for VLSI 
implementation of digital signal processing (DSP) 
algorithms involving inner product of vectors and 
vector-matrix multiplication. We demonstrate that 
NEDA is a very efficient architecture with adders as 
the main component and free of ROM (free memory), 
multiplication, and subtraction. For the adder array, a 
systematic approach is introduced to remove the 
potential redundancy so that minimum additions are 
necessary. 
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